Search CORE

arXiv.org e-Print Archive

FigShare

Patterns of subnet usage reveal distinct scales of regulation in the transcriptional regulatory network of Escherichia coli

Author: A Travers
C Marr
Carsten Marr
DP Sangurdekar
E Krause
Fabian J. Theis
G Balázsi
H Yu
J Vogel
J Ward Jr
JD Glasner
JJ Faith
Larry S. Liebovitch
M. Madan Babu
Marc-Thorsten Hütt
MJ Herrgard
N Blot
N Sonnenschein
NM Luscombe
O Alter
Q Cui
R Milo
R Milo
RM Gutierrez-Rios
S Gama-Castro
S Gottesman
S Mangan
S Mangan
S Mangan
SS Shen-Orr
T Beissbarth
VG Tusher
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2010
Field of study

The set of regulatory interactions between genes, mediated by transcription factors, forms a species' transcriptional regulatory network (TRN). By comparing this network with measured gene expression data one can identify functional properties of the TRN and gain general insight into transcriptional control. We define the subnet of a node as the subgraph consisting of all nodes topologically downstream of the node, including itself. Using a large set of microarray expression data of the bacterium Escherichia coli, we find that the gene expression in different subnets exhibits a structured pattern in response to environmental changes and genotypic mutation. Subnets with less changes in their expression pattern have a higher fraction of feed-forward loop motifs and a lower fraction of small RNA targets within them. Our study implies that the TRN consists of several scales of regulatory organization: 1) subnets with more varying gene expression controlled by both transcription factors and post-transcriptional RNA regulation, and 2) subnets with less varying gene expression having more feed-forward loops and less post-transcriptional RNA regulation.Comment: 14 pages, 8 figures, to be published in PLoS Computational Biolog

City University of New York

arXiv.org e-Print Archive

PuSH

Interpreting 16S metagenomic data without clustering to achieve sub-OTU resolution

Author: A Klindworth
A Shade
A Shade
AM Eren
BJ Haas
C Huttenhower
C Lozupone
C Quince
C Quince
DE Hunt
DN Fredricks
EK Costello
EK Costello
H Ochman
JG Caporaso
JG Caporaso
JI Prosser
JJ Faith
JL VandeWalle
JR Brestoff
M Hamady
MGI Langille
Mikhail Tikhonov
MJ Morgan
MJ Rosen
N Fierer
N Kamada
ND Youngblut
Ned S Wingreen
O Lukjancenko
PD Schloss
PD Schloss
PD Schloss
PJ Turnbaugh
RC Edgar
RC Edgar
RC Edgar
Robert W Leach
SJ Song
SM Huse
SP Preheim
TP Tourova
V Kunin
WJ Sul
Y Huang
ZJ Zheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/07/2014
Field of study

The standard approach to analyzing 16S tag sequence data, which relies on clustering reads by sequence similarity into Operational Taxonomic Units (OTUs), underexploits the accuracy of modern sequencing technology. We present a clustering-free approach to multi-sample Illumina datasets that can identify independent bacterial subpopulations regardless of the similarity of their 16S tag sequences. Using published data from a longitudinal time-series study of human tongue microbiota, we are able to resolve within standard 97% similarity OTUs up to 20 distinct subpopulations, all ecologically distinct but with 16S tags differing by as little as 1 nucleotide (99.2% similarity). A comparative analysis of oral communities of two cohabiting individuals reveals that most such subpopulations are shared between the two communities at 100% sequence identity, and that dynamical similarity between subpopulations in one host is strongly predictive of dynamical similarity between the same subpopulations in the other host. Our method can also be applied to samples collected in cross-sectional studies and can be used with the 454 sequencing platform. We discuss how the sub-OTU resolution of our approach can provide new insight into factors shaping community assembly.Comment: Updated to match the published version. 12 pages, 5 figures + supplement. Significantly revised for clarity, references added, results not change

Princeton University Open Access Repository

Query Large Scale Microarray Compendium Datasets Using a Model-Based Bayesian Approach with Variable Selection

Author: A Gelman
A Tanay
AA Margolin
AB Brinkman
AB Owen
AE Gelfand
AF Neuwald
AJ Butte
CJ Wolfe
D Ghosh
DE Bassett Jr
DJ Lockhart
FP Roth
G Getz
GJ McLachlan
H Salgado
J Qian
J Quackenbush
JJ Faith
JJ Faith
JS Liu
KY Yeung
M Medvedovic
M Schena
MA Hibbs
MB Eisen
MG Walker
Ming Hu
ML Urbanowski
Neil Hall
P Tamayo
PO Brown
Q Sheng
R Chen
S Kim
SC Madeira
SK Kim
T Dhollander
TF Smith
TH Tani
TR Hughes
VK Mootha
Y Cheng
Zhaohui S. Qin
ZS Qin
Publication venue: Public Library of Science
Publication date: 13/02/2009
Field of study

In microarray gene expression data analysis, it is often of interest to identify genes that share similar expression profiles with a particular gene such as a key regulatory protein. Multiple studies have been conducted using various correlation measures to identify co-expressed genes. While working well for small datasets, the heterogeneity introduced from increased sample size inevitably reduces the sensitivity and specificity of these approaches. This is because most co-expression relationships do not extend to all experimental conditions. With the rapid increase in the size of microarray datasets, identifying functionally related genes from large and diverse microarray gene expression datasets is a key challenge. We develop a model-based gene expression query algorithm built under the Bayesian model selection framework. It is capable of detecting co-expression profiles under a subset of samples/experimental conditions. In addition, it allows linearly transformed expression patterns to be recognized and is robust against sporadic outliers in the data. Both features are critically important for increasing the power of identifying co-expressed genes in large scale gene expression datasets. Our simulation studies suggest that this method outperforms existing correlation coefficients or mutual information-based query tools. When we apply this new method to the Escherichia coli microarray compendium data, it identifies a majority of known regulons as well as novel potential target genes of numerous key transcription factors

A Relative Variation-Based Method to Unraveling Gene Regulatory Networks

Author: A Greenfield
A Madar
A Pinna
AA Margolin
BE Perrin
D Marbach
D Marbach
F Ferrazzi
Frank Emmert-Streib
H de Jong
I Cantone
J Schäfer
JJ Faith
JJ Rice
JM Lattin
KM Zhou
KY Yip
L Ljung
PE Meyer
R Opgen-Rhein
RJ Prill
S Martin
T Akutsu
T Schaffter
T Zhou
Tong Zhou
TS Gardner
Y Wang
Yali Wang
ZZ Hu
Publication venue: Public Library of Science
Publication date: 20/02/2012
Field of study

Gene regulatory network (GRN) reconstruction is essential in understanding the functioning and pathology of a biological system. Extensive models and algorithms have been developed to unravel a GRN. The DREAM project aims to clarify both advantages and disadvantages of these methods from an application viewpoint. An interesting yet surprising observation is that compared with complicated methods like those based on nonlinear differential equations, etc., methods based on a simple statistics, such as the so-called -score, usually perform better. A fundamental problem with the -score, however, is that direct and indirect regulations can not be easily distinguished. To overcome this drawback, a relative expression level variation (RELV) based GRN inference algorithm is suggested in this paper, which consists of three major steps. Firstly, on the basis of wild type and single gene knockout/knockdown experimental data, the magnitude of RELV of a gene is estimated. Secondly, probability for the existence of a direct regulation from a perturbed gene to a measured gene is estimated, which is further utilized to estimate whether a gene can be regulated by other genes. Finally, the normalized RELVs are modified to make genes with an estimated zero in-degree have smaller RELVs in magnitude than the other genes, which is used afterwards in queuing possibilities of the existence of direct regulations among genes and therefore leads to an estimate on the GRN topology. This method can in principle avoid the so-called cascade errors under certain situations. Computational results with the Size 100 sub-challenges of DREAM3 and DREAM4 show that, compared with the -score based method, prediction performances can be substantially improved, especially the AUPR specification. Moreover, it can even outperform the best team of both DREAM3 and DREAM4. Furthermore, the high precision of the obtained most reliable predictions shows that the suggested algorithm may be very helpful in guiding biological experiment designs

Analysis of among-site variation in substitution patterns

Author: A Reyes
AM Pedersen
C Lanave
DD Pollock
DL Swofford
DM Robinson
H Akaike
J Sullivan
JA Rice
JD Thompson
JJ Faith
JP Bielawski
JP Huelsenbeck
KP Burnham
LA Frederico
LA Frederico
M Hasegawa
M Tanaka
MP Francino
R Nielsen
Z Yang
Z Yang
Z Yang
Z Yang
Publication venue: Biological Procedures Online
Publication date: 01/01/2004
Field of study

Substitution patterns among nucleotides are often assumed to be constant in phylogenetic analyses. Although variation in the average rate of substitution among sites is commonly accounted for, variation in the relative rates of specific types of substitution is not. Here, we review details of methodologies used for detecting and analyzing differences in substitution processes among predefined groups of sites. We describe how such analyses can be performed using existing phylogenetic tools, and discuss how new phylogenetic analysis tools we have recently developed can be used to provide more detailed and sensitive analyses, including study of the evolution of mutation and substitution processes. As an example we consider the mitochondrial genome, for which two types of transition deaminations (C⇒T and A⇒G) are strongly affected by single-strandedness during replication, resulting in a strand asymmetric mutation process. Since time spent single-stranded varies along the mitochondrial genome, their differential mutational response results in very different substitution patterns in different regions of the genome

Springer - Publisher Connector

The Escherichia coli transcriptome mostly consists of independently regulated modules

Author: A Anand
A Biton
A Delorme
A Frigyesi
A Hyvärinen
A Santos-Zavaleta
A-M Martoglio
AE Teschendorff
B Dalrymple
B Langmead
B-K Cho
B-K Cho
BM Bolstad
C Vijayendran
CL Turnbough Jr
D Kim
D Marbach
D Risso
D-S Huang
DS Latchman
E Nudler
EJ O’Brien
ENCODE Project Consortium.
ER Gansner
F Pedregosa
GI Guzmán
GI Guzmán
H Zou
HS Rhee
I Kristoficova
IM Keseler
J Pouyssegur
J Utrilla
JE Galagan
JJ Faith
JM Buescher
JM Engreitz
JM Monk
JT Leek
K Valgepea
K-K Yan
KF Jensen
KJ Karczewski
L Wang
M Ester
M Kim
M Lawrence
M Moretto
M Scott
M Scott
MB Gerstein
MI Love
NE Lewis
O Alter
P Chiappetta
P Comon
PR Subbarayan
PV Phaneuf
R De Smet
R Kolter
RA LaCroix
RB D’agostino
S Gama-Castro
S Lin
SJ Larsen
SW Seo
T Baba
T Barrett
TM Henkin
W Kong
W Liebermeister
W Saelens
X Zhang
Xin Fang
XW Zhang
Y Gao
Y Yamanaka
Z Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Underlying cellular responses is a transcriptional regulatory network (TRN) that modulates gene expression. A useful description of the TRN would decompose the transcriptome into targeted effects of individual transcriptional regulators. Here, we apply unsupervised machine learning to a diverse compendium of over 250 high-quality Escherichia coli RNA-seq datasets to identify 92 statistically independent signals that modulate the expression of specific gene sets. We show that 61 of these transcriptomic signals represent the effects of currently characterized transcriptional regulators. Condition-specific activation of signals is validated by exposure of E. coli to new environmental conditions. The resulting decomposition of the transcriptome provides: a mechanistic, systems-level, network-based explanation of responses to environmental and genetic perturbations; a guide to gene and regulator function discovery; and a basis for characterizing transcriptomic differences in multiple strains. Taken together, our results show that signal summation describes the composition of a model prokaryotic transcriptome

eScholarship - University of California

ScholarWorks@UNIST

Online Research Database In Technology

Origin of Co-Expression Patterns in E.coli and S.cerevisiae Emerging from Reverse Engineering Algorithms

Author: A Pothen
AJ Butte
B Snel
BA Cohen
BB Tuch
BP Tu
C Sabatti
Claudio Altafini
D Hwang
D Lin
D Thieffry
Daniele Bianchini
G Butland
H De Jong
H Kim
H Salgado
I Lee
I Yanai
J Forster
J Ihmels
J Korbel
J Reed
J Wu
JJ Faith
JJ Faith
K Basso
LJ Lu
M Arifuzzaman
M Bansal
M Levine
Mark Isalan
Mattia Zampieri
MJ Herrgård
N Simonis
N Soranzo
Nicola Soranzo
NM Luscombe
R Hershberga
R Jansen
R Jansen
R Kothapalli
S Balaji
S Teichmann
S Wuchty
SA Teichmann
TS Gardner
X Gan
Y Bilu
Y Chen
Y Kang
Y Qi
Y Yamanishi
Z Li
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

BACKGROUND: The concept of reverse engineering a gene network, i.e., of inferring a genome-wide graph of putative gene-gene interactions from compendia of high throughput microarray data has been extensively used in the last few years to deduce/integrate/validate various types of "physical" networks of interactions among genes or gene products. RESULTS: This paper gives a comprehensive overview of which of these networks emerge significantly when reverse engineering large collections of gene expression data for two model organisms, E. coli and S. cerevisiae, without any prior information. For the first organism the pattern of co-expression is shown to reflect in fine detail both the operonal structure of the DNA and the regulatory effects exerted by the gene products when co-participating in a protein complex. For the second organism we find that direct transcriptional control (e.g., transcription factor-binding site interactions) has little statistical significance in comparison to the other regulatory mechanisms (such as co-sharing a protein complex, co-localization on a metabolic pathway or compartment), which are however resolved at a lower level of detail than in E. coli. CONCLUSION: The gene co-expression patterns deduced from compendia of profiling experiments tend to unveil functional categories that are mainly associated to stable bindings rather than transient interactions. The inference power of this systematic analysis is substantially reduced when passing from E. coli to S. cerevisiae. This extensive analysis provides a way to describe the different complexity between the two organisms and discusses the critical limitations affecting this type of methodologies